Performance optimization of an FPGA-based configurable multiprocessor for matrix operations

نویسندگان

  • Xiaofang Wang
  • Sotirios G. Ziavras
چکیده

Several driving forces have recently brought about significant advances in the field of configurable computing. They have also enabled parallel processing within a single field-programmable gate array (FPGA) chip. The ever-increasing complexity of application algorithms and the supercomputing crisis have made this new parallel-processing approach more important and pertinent. Its cost-effectiveness provides system designers with the greatest flexibility while imposing many challenges to current hardware and software codesign methodologies. This paper explores practical hardware and software design and implementation issues for FPGA-based configurable multiprocessors, based on the authors' first-hand experience with a shared-memory implementation of parallel LU factorization for sparse block-diagonal-bordered (BDB) matrices. We also propose a new dynamic load balancing strategy for parallel LU factorization on our system. Performance results are included to prove the viability of this new multiprocessor design approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Heterogeneous Systems Efficiency Using Self-Configurable FPGA-based Computing

Computer systems performance is being improved today using two major approaches: general-purpose computers computing power increase (creation of multicore processors, multiprocessor computer systems, supercomputers), and adaptation of the computer hardware to the executed algorithm (class of algorithms). Last approach often provides application of the ASIC-based and FPGA-based hardware accelera...

متن کامل

Address for Correspondence

Solving a system of linear equations i s a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather than sequential ones. Nevertheless, commercially available supercomputers are expensive and only ...

متن کامل

Parallel solution of Newton’s power flow equations on configurable chips

The conventional Newton’s method (also known as Newton–Raphson method) for the AC power flow problem is preferred in some situations due to its local quadratic convergence. However, its high computation and memory requirements due to the required LU factorization of the Jacobian matrix at each iteration limit its practical employment in the online operation of very large systems. We produce her...

متن کامل

Hardware/Software Co-Configuration for Multiprocessor SoPC

Real-time operating systems (RTOS) for multiprocessor systems built on a single FPGA should be configurable to a wide rage of architecture. Because the configuration of RTOS depends on hardware architecture, it is advantageous to co-configure multiprocessor architecture and RTOS simultaneously. This paper is a work-in-progress report of our research on configurable RTOS and co-configuration

متن کامل

Layout driven FPGA packing algorithm for performance optimization

FPGA is a 2D array of configurable logic blocks. Packing is to pack logic elements into device specific configurable logic blocks for subsequent placement. The traditional fixed delay model of inter and intra cluster delays used in packing does not represent post-placement delays and often leads to sub-optimal solutions. This paper presents a new layout driven packing algorithm, named LDPack, b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003